18 research outputs found
The Shapley Value of Classifiers in Ensemble Games
What is the value of an individual model in an ensemble of binary
classifiers? We answer this question by introducing a class of transferable
utility cooperative games called \textit{ensemble games}. In machine learning
ensembles, pre-trained models cooperate to make classification decisions. To
quantify the importance of models in these ensemble games, we define
\textit{Troupe} -- an efficient algorithm which allocates payoffs based on
approximate Shapley values of the classifiers. We argue that the Shapley value
of models in these games is an effective decision metric for choosing a high
performing subset of models from the ensemble. Our analytical findings prove
that our Shapley value estimation scheme is precise and scalable; its
performance increases with size of the dataset and ensemble. Empirical results
on real world graph classification tasks demonstrate that our algorithm
produces high quality estimates of the Shapley value. We find that Shapley
values can be utilized for ensemble pruning, and that adversarial models
receive a low valuation. Complex classifiers are frequently found to be
responsible for both correct and incorrect classification decisions.Comment: Source code is available here:
https://github.com/benedekrozemberczki/shaple
Multi-scale attributed node embedding
We present network embedding algorithms that capture information about a node
from the local distribution over node attributes around it, as observed over
random walks following an approach similar to Skip-gram. Observations from
neighborhoods of different sizes are either pooled (AE) or encoded distinctly
in a multi-scale approach (MUSAE). Capturing attribute-neighborhood
relationships over multiple scales is useful for a diverse range of
applications, including latent feature identification across disconnected
networks with similar attributes. We prove theoretically that matrices of
node-feature pointwise mutual information are implicitly factorized by the
embeddings. Experiments show that our algorithms are robust, computationally
efficient and outperform comparable models on social networks and web graphs.Comment: Published in the Journal of Complex Network
Explainable Biomedical Recommendations via Reinforcement Learning Reasoning on Knowledge Graphs
For Artificial Intelligence to have a greater impact in biology and medicine,
it is crucial that recommendations are both accurate and transparent. In other
domains, a neurosymbolic approach of multi-hop reasoning on knowledge graphs
has been shown to produce transparent explanations. However, there is a lack of
research applying it to complex biomedical datasets and problems. In this
paper, the approach is explored for drug discovery to draw solid conclusions on
its applicability. For the first time, we systematically apply it to multiple
biomedical datasets and recommendation tasks with fair benchmark comparisons.
The approach is found to outperform the best baselines by 21.7% on average
whilst producing novel, biologically relevant explanations
Chickenpox Cases in Hungary: A Benchmark Dataset for Spatiotemporal Signal Processing with Graph Neural Networks
Recurrent graph convolutional neural networks are highly effective machine
learning techniques for spatiotemporal signal processing. Newly proposed graph
neural network architectures are repetitively evaluated on standard tasks such
as traffic or weather forecasting. In this paper, we propose the Chickenpox
Cases in Hungary dataset as a new dataset for comparing graph neural network
architectures. Our time series analysis and forecasting experiments demonstrate
that the Chickenpox Cases in Hungary dataset is adequate for comparing the
predictive performance and forecasting capabilities of novel recurrent graph
neural network architectures